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Abstract:  Current  analytical  techniques  applied  to  functional  magnetic  resonance  imaging  (fMRI)  data 
require  a  priori  knowledge  or  specific  assumptions  about  the  time  courses  of  processes  contributing  to  the 
measured  signals.  Here  we  describe  a  new  method  for  analyzing  fMRI  data  based  on  the  independent 
component  analysis  (ICA)  algorithm  of  Bell  and  Sejnowski  ([1995]:  Meural  Comput  7:1129-1159).  VVe 
decomposed  eight  fMRI  data  sets  from  4  normal  subjects  performing  Stroop  color-naming,  the  Brown  and 
Peterson  word/number  task,  and  control  tasks  into  spatially  independent  components.  Each  component 
consisted  of  voxel  values  at  fixed  three-dimensional  locations  (a  component  "map”),  and  a  unique 
associated  time  course  of  activation.  Given  data  from  144  time  points  collected  during  a  6-min  trial,  ICA 
extracted  an  equal  number  of  spatially  independent  components.  In  all  eight  trials,  ICA  derived  one  and 
only  one  component  with  a  time  course  closely  matching  the  time  course  of  40-sec  alternations  between 
experimental  and  control  tasks.  The  regions  of  maximum  activity  in  these  consistently  task-related 
components  generally  overlapped  active  regions  detected  by  standard  correlational  analysis,  but  included 
frontal  regions  not  detected  by  correlation.  Time  courses  of  other  ICA  components  were  transiently 
task-related,  quasiperiodic,  or  slowly  varying.  By  utilizing  higher-order  statistics  to  enforce  successively 
stricter  criteria  for  spatial  independence  between  component  maps,  both  the  ICA  algorithm  and  a  related 
fourth-order  decomposition  technique  {Comon  [1994]:  Signal  Processing  36:11-20)  were  superior  to 
principal  component  analysis  (PCA)  in  determining  the  spatial  and  temporal  extent  of  task-related 
activation,  For  each  subject,  the  time  courses  and  active  regions  of  the  task-related  IC.4  components  w'ere 
consistent  across  trials  and  were  robust  to  the  addition  of  simulated  noise.  Simulated  movement  artifact 
and  simulated  task-related  activations  added  to  actual  fIvIRI  data  were  clearly  separated  by  the  algorithm. 
ICA  can  be  used  to  distinguish  betw-een  nontask-related  signal  components,  movements,  and  other 
artifacts,  as  well  as  consistently  or  transiently  task-related  fMRI  activations,  based  on  oniv  weak 
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assumptions  about  their  spatial  distributions  and  without  a  priori  assumptions  about  their  time  courses. 
ICA  appears  to  be  a  highly  promising  method  for  fhe  analysis  of  fMRJ  data  from  normal  and  clinical 
populations,  especially  for  uncovering  unpredictable  transient  patterns  of  brain  activity'  associated  with 
performance  of  psychomotor  tasks.  Hum.  Brain  Mapping  6:1 60-288, 1 998.  c  I99g  Wikj-Uss,  Inc. 
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INTRODUCTION  from  each  voxel.  These  voxels,  whose  signals  are  posi- 


Many  current  functional  magnetic  resonance  imag¬ 
ing  (flvlRI)  experiments  use  a  block  design  in  w'hich  the 
subject  is  instructed  to  perform  experimental  (E)  and 
control  (C)  tasks  in  an  alternating  sequence  of  20-40- 
sec  blocks  (e.g.,  CECECEC. . .).  During  such  a  trial, 
signals  from  thousands  of  volume  elements  (voxels)  in 
each  of  several  brain  slices  are  typically  acquired  every 
1-3  sec.  The  resultant  time  series  recorded  for  each 
voxel  may  contain  a  complicated  mixture  of  high-  and 
low-frequency  activity  (Fig.  1),  w'hich  is  most  probably 
produced  by  a  medley  of  local  or  spatialty  distributed 
processes,  including  task-related  and  nontask-related 
hemodynamic  brain  tissue  activations  as  well  as  mo¬ 
tion  or  machine  artifacts.  This  tangled  mixture  of 
signals  presents  a  formidable  challenge  for  analyttcal 
methods  attempting  to  tease  apart  task-related  changes 
from  the  disparate  time  courses  of  5,000-25,000  voxels. 

Changes  in  fMRI  signal  (including  blood  oxygen 
level-dependent  (BOLD)  contrast  [Ogawa  et  al.,  1992]) 
related  to  alternating  performance  of  experimental  and 
control  tasks  have  been  analyzed  by  a  number  of 
techniques,  including  subtraction,  correlation,  and  time- 
frequency  analyses,  and  have  been  tested  statisbcally 
using  t-tests  [Kw'ong  et  al.,  1992],  analysis  of  variance/ 
covariance  (ANOVA/ANCOVA)  (Friston,  1996),  and 
nonparametric  Komolgorov-Smimov  tests  [Stuart  and 
Ord,  1991;  Kwong,  1995]. 

Subtraction  or,  rhore  generally,  correlation  techniques 
[Bandettini  et  al.,  1993]  are  based  on  the  assumption  that 
v^oxels  indexing  brain  regions  participating  in  the  cogni¬ 
tive  processing  of  the  given,  experimental  and  control  tasks 
should  show  different  ftv'IRl  signal  levels  during  the 
performance  of  these  tasks.  Correlation  techniques  exploit 
a  priori  knowledge  of  the  expected  time  course  of  task- 
related  changes  in  the  signal  to  determine  their  intensity 
and  spatial  extent.  A  reference  function  is  created  by  convolv¬ 
ing  the  block  design  of  tire  behavioral  experiment  (CE¬ 
CECEC.  ,  .)  with  a  fixed  model  of  the  hemodynamic 
response  function  (an  estimate  of  the  flViRI  signal  changes 
evoked  b)'  a  brief  burst  of  neural  activity).  This  reference 
function  is  then  correlated  with  the  time  series  recorded 


tively  correlated  with  the  reference  function  above  a 
preselected  threshold,  are  designated  "areas  of  activation." 
Although  this  method  is  both  computationally  simple  and 
reasonabh'  effective,  it  has  several  major  drawbacks.  Even 
in  areas  of  activ'ation,  the  task-related  signal  changes  are 
ty'pically  small  f<10%),  suggesting  that  other  time- 
vary'ing  phenomena  must  produce  the  bulk  of  the  mea¬ 
sured  signals.  These  phenomena  can  be  conceptualized  as 
multiple  concurrent  "cortvponent  processes,"  each  having  a 
separate  time  course  and  spatial  extent  and  each  produc¬ 
ing  simultaneous  changes  in  the  fMRI  signals  of  many 
voxels.  Other  component  processes  may  not  be  com¬ 
pletely  uncorrelated  with  task-related  changes,  and  so 
may  tend  to  mask  the  effects  of  activations  related  to 
task-performance,  reducing  the  sensitivity  and  specificity 
of  correlational  analysis.  If  the  nontask-relev'ant  compo¬ 
nent  processes  are  monotonic  and  linear,  simple  linear 
detrending  [Bandettini  et  al.,  1993]  can  be  expected  to 
enhance  the  accuracy  of  correlational  analysis.  Howrever, 
the  time  courses  of  processes  related  to  changes  in  arousal, 
task  strategy',  head  position,  machine  artifacts,  or  other 
endogenous  processes  occurring  during  a  trial  may'  not 
resemble  simple  linear  or  nonlinear  functions. 

More  general  ANOVA-like  approaches  [Friston, 
1996],  including  statistical  parametric  mapping  (SPM) 
[Friston,  1995],  test  the  signal  at  each  voxel  using 
univariate  measures  (e.g.,  t-tests,  or  f-tests)  under  the 
null  hy'pothesis  that  the  values  are  distributed  under  a 
know'n  probability  distribution  (typically  Gaussian). 
Voxels  in  which  the  signal  difference  betvv'een  the  task 
and  control  conditions  exceeds  a  predefined  lev'el  of 
significance  are  selected  as  active,  resulting  in  a  distrib¬ 
uted  spatial  image  giving  anatomical  areas  of  signifi¬ 
cant  task-related  actuation  difference.  Using  this  tech¬ 
nique,  if  is  possible  to  test  multiple  factors  that  may 
contribute  to  changes  in  the  fMRI  signals  in  addition  to 
the  task  design.  However,  ANO\‘A-Iike  methods  are 
based  on  the  assumptions,  tenuous  for  fMRI  data,  that: 
1}  the  observ'ations  have  a  known  distribution  (e.g., 
Gaussian),  2)  the  v'ariances  and  covariances  between 
repeated  measurements  are  equal,  3)  the  time  courses 
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Figure  !. 

BOLD  signal  complexity  and  task  reference  function,  a:  Time 
courses  of  10  randomly  selected  voxels  from  a  6-min  fMRI  trial  of 
the  Scroop  color-naming  task  illustrate  the  typical  complexity  of 
BOLD  signals,  b:  Convolving  an  a  priori  estimate  of  the  hemody¬ 
namic  response  function  with  the  square-wave  function  represent¬ 
ing  the  task  block  structure  of  the  trial,  alternating  experimental 
{Exp)  and  control  (Con)  blocks  (upper  trace)  produce  the  refer¬ 
ence  function  for  the  trial  (bottom  trace). 


of  different  factors  affecting  the  variance  of  the  fMRI 
signal  can  be  reliably  estimated  in  advance,  and  4)  the 
signals  at  different  voxels  are  independent.  Signal 
distributions  can  be  made  more  Gaussian  by  spatial 
and  temporal  smoothing,  but  this  smoothing  also 
degrades  the  temporal  and  spatial  resolution  of  the 
data. 

Time-frequency  analyses  describe  the  signal  re¬ 
corded  from  each  voxel  in  the  frequency  domain  and 
may  be  useful  for  distinguishing  between  physiologi¬ 
cal  pulsatile  and  other  repetitive  artifacts  known  to  be 
present  in  fMRI  data  [Mitra  et  ai.,  1997|.  Such  tech¬ 
niques  assume  that  signal  change  produced  by  task 
performance  and  other  sources  of  physiological  inter¬ 
est  have  frequency  spectra  different  from  other  causes 
of  variability  in  the  data.  Many  of  these  techniques 
assume  periodicity  in  the  hme  courses  of  the  compo¬ 


nent  sources,  which  may  not  be  v^alid,  although  wave¬ 
let  techniques  currently  being  explored  might  possibly 
relax  this  requirement  (Brammer  etal,  1997]. 

Correlational,  time-frequency,  and  ANOVLA-based 
methods  share  an  inherent  weakness  common  to 
univariate  techniques  currently  used  for  analysis  of 
fMRI  data:  they  do  not  attempt  to  extract  the  intrinsic 
structure  of  the  data.  This  could  be  a  particularly 
significant  drawback  in  cases  where  accurate  a  priori 
models  of  fMRI  signal  changes  in  response  to  experi¬ 
mental  events  are  not  known  or  may  not  be  constant 
across  all  voxels,  e.g„  in  data  from  patient  populations 
with  pathological  brain  conditions,  or  from  subjects 
performing  complex  learning  tasks.  Another  draw¬ 
back  of  ANOVA-based  and  correlational  measures  is 
that  they  tv'picajly  require  grouping  or  averaging  data 
ov^er  several  task/control  blocks.  This  reduces  their 
sensi.tivtty  for  detecting  transient  task-related  changes 
in  the  fMRI  signal,  and  makes  them  insensitive  to 
significant  changes  not  consistently  time-locked  to  the 
task  block  design.  These  could  include  changes  in 
strategy  by  the  subject  during  the  test  period,  changes 
associated  with  learning  or  habituation  of  task  perfor¬ 
mance,  with  fatigue,  or  with  other  processes  whose 
time  courses  cannot  be  predicted  in  advance  by  the 
experimenter.  Univariate  techniques  also  ignore  rela¬ 
tionships  between  voxels,  hindering  the  detection  of 
brain  regions  acting  as  functional  units  during  the 
experiment. 

Principal  component  analysis  (PCA)  has  been  pro¬ 
posed  as  a  w'ay  to  isolate  functional  patterns  in 
functional  imaging  data  [Moeller  et  al.,  1991],  This 
technique  first  measures  the  tendency  of  signals  at  all 
possible  pairs  of  voxels  to  covary,  and  then  finds  the  ’ 
orthogonal  spatial  patterns  or  eigenimages  capturing 
the  greatest  variance  in  the  data.  The  first  eigenimage 
represents  the  largest  source  of  variance  between  pairs 
of  voxels,  the  second  eigenimage  represents  the  largest 
source  of  residual  variance  orthogonal  to  the  first 
eigenimage,  and  so  on.  Normally,  the  number  of 
principal  components  required  to  adequately  repre¬ 
sent  the  data  to  a  specified  level  of  accuracy  is  much 
smaller  than  the  original  dimension  of  the  data  [Jack- 
son,  1991],  and  thus  PCA  can  provide  a  useful  method 
for  reducing  data  dimensionality  However,  if  task- 
related  fMRI  changes  are  only  a  small  part  of  the  total 
signal  variance,  retaining  the  orthogonal  eigenimages 
capturing  the  greatest  variance  in  the  data  may  reveal 
little  information  about  task-related  activations  or 
other  processes  of  interest.  Additionally,  if  during  an 
fMRI  experiment  numerous  voxels  become  simulta¬ 
neously  activated,  component  analysis  methods  based 
solely  on  voxel-pair  relationships  or  covariances  may 
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not  capture  their  overall  patterns  of  association.  These 
shortcomings  suggest  the  desirability  of  a  general 
fMRI  analytical  technique  capable  of  extracting  the 
intrinsic  spatiotemporal  structure  of  the  data  without 
the  aforementioned  limitations  associated  with  PCA 
and  other  existing  analytical  tools. 

Here  we  describe  a  new  technique  for  the  analysis  of 
f.MRI  data  based  on  the  statistical  method  of  indepen¬ 
dent  component  analysis  (ICA)  [Comon,  1994;  Bell  and 
Sejnowski,  1995].  It  potentially  allows  the  extraction  of 
both  transient  and  consistently  task-related,  as  well  as 
physiologically-relevant  nontask-related,  and  various 
artifactual  components  of  the  obsen-'ed  fMRI  signals. 

INDEPENDENT  COHPONENT  ANALYSIS 

Functional  organization  of  the  brain  is  based  on  two 
complementary  principles,  localization  and  connection- 
ism  [Phillips  et  al,  1984].  Localization  implies  that  each 
psychomotor  function  is  performed  principally  in  a 
srnall  set  of  brain  areas.  This  principle  derives  origi¬ 
nally  from  clinical  experience  where  a  restricted  locus 
of  damage  to  the  nenmus  system  could  usually  be 
inferred  from  a  specific  pattern  of  deficits  demon¬ 
strated  by  a  subject  [Gardner,  1975].  Occasionally,  the 
locus  of  the  lesion  cannot  accurately  be  directly  deter¬ 
mined  by  the  pattern  of  deficits,  as  in  the  clinical 
"disconnection  syndromes"  (e.g.,  alexia  without 
agraphia  [Duffield  et  al,  1994;  Quint  and  Gilmore, 
1992]  and  pure  word  deafness  [Takahashi  et  al.,  1992] ), 
because  the  lesion  interrupts  connections  betw'een 
macroscopic  loci  required  to  perform  some  psychomo¬ 
tor  task.  'Phis  demonstrates  the  compIementar\-  prin¬ 
ciple  of  connectionism  that  posits  that  the  brain  regions 
involved  in  a  given  psychomotor  function  may  be 
widely  distributed,  and  thus  the  brain  activity  re¬ 
quired  to  perform  a  given  task  may  be  the  functional 
integration  of  activity  in  multiple  macroscopic  loci  or 
distinct  brain  systems  (this  is  a  different  sense  of  the 
term  "connectionism"  from  that  used  to  describe 
neural  network  models). 

Consistent  with  these  principles,  we  suggest  that  the 
multifocal  brain  areas  activated  by  performance  of  a 
psychomotor  task  should  be  unrelated  to  the  brain 
areas  whose  signals  are  affected  by  artifacts,  such  as 
physiological  pulsations,  subtle  head  movements,  and 
machine  noise  which  may  dominate  fMM  experi¬ 
ments.  Each  of  these  separate  processes  may  be  repre¬ 
sented  by  one  or  more  spatiaily-independent  compo¬ 
nents,  each  associated  with  a  single  time  course  of 
enhancement  and/or  suppression  and  a  component 
map  (Fig.  2).  We  assume  the  component  maps,  each 
specified  by  a  spatial  distribution  of  fixed  values  (one 
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Figure  2. 

Schematic  of  fHRi  data  decomposed  into  independent  connpo- 
nents.  Each  independent  component  produced  by  the  ICA  algo¬ 
rithm  consists  of  a  spatial  distribution  of  voxel  values  (“component 
map"),  and  an  associated  time  course  of  activation.  The  four 
schematic  component  maps  show  voxels  participating  most  ac¬ 
tively  in  each  of  four  hypothetical  components.  Under  ICA.  the 
signal  observed  at  a  given  voxel  Is  modeled  as  a  sum  of  the 
contributions  of  all  the  independent  components.  The  amount 
each  component  contributes  to  the  data  is  determined  by  the 
outer  product  of  the  voxel  values  in  its  component  map  with  the 
activation  values  in  its  time  course.  Note  that  active  areas  of 
statisbcaily  independent  map  value  distributions  may  be  partially 
overlapping. 


at  each  voxel),  represent  possibly  overlapping,  multifo¬ 
cal  brain  areas  of  statistically  dependent  fMR]  signal 
influence.  Furthermore,  we  presume  that  the  compo¬ 
nent  map  distribuHons  are  spatmlly  independent,  and 
hence  uniquely  specified.  This  means  that  if  Pk(Ci,) 
specifies  the  probability  distribution  of  the  voxel  val¬ 
ues  Ck  in  the  k*  component  map,  then  the  joint 
probability  distribution  of  all  n  components  factorizes: 


p(C„  C,, . . . ,  C„)  ==  n  Pk(Ck) 


where  each  of  the  component  maps  is  a  vector  (Cki, 
i  =  1,2,..,  M),  and  M  is  the  number  of  voxels. 


♦  163  «■ 


Note  that  this  is  a  much  stronger  criterion  than 
saying  that  the  voxel  values  between  pairs  of  compo¬ 
nents  are  merely  uncorrelated,  i.e., 

M 

Ci  •  Cj  =  X  C,fcC,k  -  0,  for  i  #  j  ai 

k-"l 

since  Equation  (1)  implies  that  higher-order  correla¬ 
tions  are  also  zero. 

The  maps  will  be  independent  if  active  voxels  in  the 
maps  are  sparse  and  mostly  nonoverlapping  (Mc- 
Keown  et  ai.,  1998],  although  in  general  some  overlap 
will  occur.  We  further  assume  that  the  observed  flVlRI 
signals  are  the  linear  sum  of  the  contributions  of  the 
individual  component  processes  at  each  voxel.  With 
these  assumptions,  the  flVlRl  signals  recorded  during 
the  performance  of  psychomotor  tasks  can  be  decom¬ 
posed  into  a  number  of  independent  component  maps 
and  their  associated  component  activation  waveforms, 
using  the  ICA  algorithm  given  below  (Figs.  2,  3).  No  a 
priori  assumptions  need  be  made  about  the  time 
courses  of  activation  of  the  different  components,  or 
whether  a  given  component  is  activated  by  specific 
psychophysiological  systems  or  is  related  to  machine 
noise  or  other  artifacts. 

These  ideas  can  be  expressed  rigorously  by  writing  a 
matrix  equation  relating  the  component  maps  and 
their  time  courses  to  the  measured  tVIRI  signals.  If  the 
map  voxel  values  for  each  of  the  components  are 
known  and  placed  in  separate  rows  of  matrix  C^i,  then 
a  mixing  matrix,  Mij,^  can  specify  the  time-varying 
contributions  of  each  component  map  to  the  measured 
flVIRI  signals  (Fig,  3); 

Si 

X,.  =  X  13) 

k"  I 

Decomposing  obser\'ed  fMRI  signals  into  statisti- 
callv  independent  component  maps  without  prior 
knowledge  of  their  spatial  extents  or  time  courses  of 
activation  is  a  "blind  separation"  problem  [Jutten  and 
Herault,  1991  j.  The  independent  component  analysis 
(ICA)  algorithm  of  Bell  and  Sejnovvski  [1995],  an 
iterative  unsupervdsed  neural  network  learning  algo¬ 
rithm  based  on  information-theoretic  principles,  can 
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Figure  3. 

tMRI  data  as  a  mixture  of  independent  components.  The  mixing 
matrix  M  specifies  the  relative  contribution  of  each  component  at 
each  time  point.  ICA  finds  an  unmtxing  matrix  that  separates  the 
observed  component  mixtures  into  the  independent  component 
maps  and  time  courses. 

nent  maps  and  time  courses  of  activation  can  be 
computed. 

The  matrix  of  component  maps  is  computed  by 
multiplying  the  observed  data  matrix  X  by  VV, 

C,i  -  X  (4a) 

w^here  Wji^  is  the  unmixing  matrix  derived  from  ICA, 
Cij  is  the  value  of  the  j*  voxel  in  the  i*  component  map, 
Xkj  is  the  k*  time  point  of  the  j*  voxel,  and  the 
summation  runs  over  the  N  time  points  of  the  fMRl 
input  data.  In  matrix  notation,  this  can  be  written 
simply  as 

C  =  WX  (4b) 

where  X  is  the  fMRI  signal  data  matrix,  W  is  the 


perform  blind  separation  of  input  data  into  the  linear 
sum  of  time-vary'ing  modulations  of  m.aximally  inde¬ 
pendent  component  maps.  The  ICA  algorithm  itera¬ 
tively  determines  the  unknown  unmixing  matrix  W,  a 
possibly  linearly  scaled  and  permuted  version  of  the 
inverse  of  the  mixing  matrix,  M,  from  which  the  compo 


unmixing  matrix,  and  C  is  the  matrix  of  component 
map  voxel  values.  Note  that  W  is  a  square  matrix  of 
full  rank,  so  its  inverse  W'  ‘  is  well-defined.  Although  a 
nonlinearity  is  used  by  the  algorithm  in  the  determina¬ 
tion  of  VV  (see  below),  VV  itself  provides  a  linear 
decomposition  of  the  data. 
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Reconstruction  of  the  data  from  the  independent 
components  is  accomplished  by 

N 

Xii  =  2  (5ai 

k=1 

where  XJj  is  the  reconstructed  data  at  the  i*  time  point 
of  the  j*  voxel  and  the  summation  runs  over  the  N 
time  points  of  the  fMRl  input  data.  In  matrix  notation, 

X’  =  W"’C.  (5bi 

The  data  can  be  perfectly  reconstructed  when  W’  ’  == 
M,  i.e.,  X'  ==  W"  'C  =  MC  =  X.  The  first  column  of  VV"’ 
gives  the  time  course  of  modulation  of  the  first  compo¬ 
nent  map,  the  second  column  gives  the  time  course  of 
the  second  component  map,  and  so  on.  The  ICA 
method  can  extract  a  number  of  independent  compo¬ 
nents  up  to  the  number  of  time  points  in  the  data,  each 
having  a  map  that  does  not  change  during  the  course 
of  a  trial  and  a  unique  associated  time  course  of 
activation.  The  distributions  of  voxel  values  in  the 
component  maps,  C,  are  as  statistically  independent  as 
possible,  while  the  component  time  courses  (contained 
in  may  be  correlated.  The  order  of  the  rows  of  W, 

and  hence  of  the  calculated  ICA  components,  is  not 
meaningful  and  may  vary'  betw'een  repeated  analyses 
of  the  same  data.  It  is  useful  therefore  to  rank  order  the 
components  by  the  extent  of  their  contribution  to  the 
original  data. 

Rank  ordering  of  the  components  is  complicated  by 
the  fact  that  the  different  ICA  component  time  courses 
contained  in  are,  in  general,  non  orthogonal  so 
that,  unlike  PCA,  the  variances  explained  by  each 
component  will  not  sum  to  the  variance  of  the  original 
data.  The  contribution  each  component  makes  to  the 
magnitude  of  the  original  data,  can  be  estimated  by 
the  root  mean  square  (RMS)  of  the  data  set  recon¬ 
structed  solely  from  this  component,  i.e.,  from  Equa¬ 
tion  (5b)  w'ith  C  having  one  nonzero  row  correspond¬ 
ing  to  the  appropriate  component.  Alternatively,  the 
contribution  can  be  considered  the  RMS  error  intro¬ 
duced  per  data  point  when  the  data  is  reconstructed 
without  this  component.  Thus: 

j  N  M  \1 

■y  ~  — -I'y  y'  A? r 

I  i  1^1 

where  yi  is  the  contribution  to  the  data  from  the 
component,  N  is  the  number  of  time  points,  .M  is  the 
number  of  brain  voxels,  and  -Aj,  is  an  (N  by  M)  matrix 


.computed  from  the  outer  product  of  the  i*  component 
map  and  i*  column  of  W“l  le., 

A-jk  == 

Each  ICA  component  map  is  described  by  a  distribu¬ 
tion  of  values,  one  for  each  voxel.  These  values 
represent  the  relative  amount  a  given  voxel  is  modu¬ 
lated  by  the  activation  of  that  component.  To  find  and 
display  voxels  contributing  significantly  to  a  particular 
component  map,  the  map  values  may  be  scaled  to 
z-scores  (the  number  of  standard  deviations  from  the 
map  mean).  Voxels  whose  absolute  z-scores  are  greater 
than  some  threshold  (e.g.,  jz  >  2)  can  be  considered  to 
be  "active”  voxels  for  that  component.  In  this  case,  the 
z-scores  are  used  for  descriptive  purposes  and  have  no 
definite  statistical  interpretation.  Negative  z-scores 
indicate  voxels  whose  fMRI  signals  are  modulated 
opposite  to  the  time  course  of  activation  for  that  compo¬ 
nent. 

In  summary,  unlike  other  methods  of  fMRl  analysis 
that  begin  with  a  matrix  of  Pearson  product-moment 
correlations  [Moeller  et  al,  1991],  ICA  utilizes  a  much 
stronger  criterion  for  statistical  independence.  ICA 
decomposes  the  observed  fMRl  data  into  maps  of 
activities  that  are  as  spatially  independent  as  possible 
and  provides  a  unique  representation  of  the  data  (up  to 
scaling  and  permutation). 

THE  ICA  ALGORITHM 

Under  the  assumption  that  component  processes 
can  be  represented  by  differentially  activated  spatially 
sparse  and  spatially  independent  maps,  and  that  the 
sum  of  their  activations  equals  the  observ'ed  data,  an 
unmixing  matrix  VV  can  be  determined  using,  a  statisti¬ 
cal  method  based  on  the  "infomax"  principle  [Bell  and 
Sejnowski,  1995].  In  information  theoiyv  the  probability 
of  a  message  and  its  informational  content  are  in- 
verselv  related.  More  formally,  the  mean  uncertainty' 
or  entropy  associated  with  a  set  of  messages  in  discrete 
form  is 

H(X)  =  -2  PklogPk  <8) 

k 

where  pi,  is  the  probability  of  the  k-^  event. 

The  joint  entropy  of  two  variables  is  defined  by  H(X, 
Y)  =■  H{X)  -t  H(Y)  -  I(X,  Y),  where  I(X,  Y)  =  H(X}  - 
H(XiY)  is  the  mutual  information,  interpreted  as  the 
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redundancy  beUveen  X  and  Y  or,  alternatively  as  the 
reduction  in  uncertainty  of  one  variable  (e.g.,  X)  due  to 
the  obsen-ation  of  the  other  variable  (Y).  iCA  attempts 
to  maximize  the  joint  entropy  of  suitably  transformed 
(see  below)  component  maps,  and  in  so  doing  reduces 
the  redundancy  between  the  distributions  of  map 
values  for  different  components.  This,  in  effect,  results 
in  blind  separation  of  the  recorded  fMRI  signals  into 
spatially  independent  components.  See  Beil  and  Se- 
jnovvski  [1995j  for  a  more  detailed  discussion  of  the 
training  process. 

In  brief,  the  algorithm  initializes  VV  to  the  identity 
matrix  (I),  then  iteratively  attempts  to  maximize  H(y), 
w'here  y  =  g(C),  C  =  WX*,  and  g{)  is  a  specified 
nonlinear  function.  Here,  X^  is  a  "sphered"  version  of 
the  data  matrix  defined  by 


where 


changing  appreciably  (e.g.,  root  mean  square  change 
for  all  elements  <10  '*’). 

COHON’S  FOURTH-ORDER  TECHNIQUE  FOR 
INDEPENDENT  COMPONENT  ANALYSIS 

Common  [1994]  defined  the  concept  of  independent 
component  analysis  (ICA)  as  determining  a  linear 
transformation,  VV,  such  that  application  to  decorre- 
lated  input  data  resulted  in  approximately  maxim.al 
statistical  independence  between  outputs  (in  this  case, 
component  maps).  He  demonstrated  that  W'  could  be 
estimated  by  maximizing  a  contrast  function,  4>{VV), 
using  a  computationally  intensive  method  that  itera¬ 
tively  updated  VV  using  all  data  points  to  estimate 
<i>(W)  at  each  iteration. 

This  method  finds  a  linear  transformation,  VV,  which 


maximizes 


2(XXT)- 


and  (XXT)  is  the  N  X  N  covariance  matrix  of  the  data  vvhere 
matrix  X, 

The  nonlinear  function  g(),  which  provides  neces-  _  4mlm(  -  3(m()‘  +  12mJ,(m-)-  ~  6(m\)T  (15) 

sary  higher-order  statistical  information,  is  chosen 

here  to  be  the  logistic  function  jg  ph  moment  defined  as 


which  biases  the  algorithm  towards  finding  spatially 
sparse  component  maps  with  relatively  few  highly 
active  voxels  [McKeown  et  al.,  1998]. 

The  elements  of  W  are  updated  using  small  batches 
of  data  vectors  drawn  randomly  from  IXjj  without 
substitution,  according  to: 

/aH(v)\ 

_^VV  =  -e  W^W  =  e(I  +  (12) 


where  e  is  a  learning  rate  (typically  near  0,01)  and  the 
vector  V  has  elements 


a  IdVi 


=  (1  -  2y,). 


The  VV^W  term  in  Equation  (12),  first  proposed  by 
Amari  et  al,  [1996],  avoids  matrix  inversions  and 
speeds  convergence.  During  training,  the  learning  rate 
is  reduced  gradually  until  the  weight  matrix  W  stops 


Cj  is  the  j*  component  map  computed  from  C  -  WX, 
and  X  is  the  N  by  V  data  matrix. 

ILLUSTRATIVE  THOUGHT  EXPERIMENT 

To  assist  in  the  reader's  appreciation  of  the  concep¬ 
tual  differences  between  ICA,  PCA,  and  correlation 
analyses,  we  propose  and  analyze  a  two-dimensional 
"thought  experiment"  (Fig.  4).  Imagine  an  flVlRI  data 
set  that  is  the  sum  of  the  contributions  of  just  two 
spatially-independent  component  processes  (ICl  and 
IC2).  The  data  are  recorded  at  two  separate  time  points 
during  an  experimental  session.  At  time  point  t  -  1, 
the  subject  is  performing  an  experimental  task,  w'hile 
time  point  t  =  2  occurs  during  a  control  task  condition. 
The  twm  component  processes,  portrayed  schemati¬ 
cally  in  Figure  4a,  are  primarily  active  during  the 
control  and  experimental  task  periods,  respectively. 
Component  IC2  is  mostly  task-related,  since  it  is  highly 
active  at  t  =  1  and  only  weakly  active  at  t  2. 
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Figure  4. 

Simulated  experimenL  a:  A  simple  “thought  experiment"  to 
demonstrate  differences  between  tCA,  PCA,  and  correlation 
analysis  methods.  A  hypothetical  fMRl  data  set  is  the  sum  of  the 
activity  of  just  two  spatially-independent  processes  (iC  i  and  tC2) 
recorded  at  two  observation  times  (t  =  I ,  experimental;  t  =  2, 
control).  We  assume  that  process  fC2  is  mostly  task-reiated  (e.g., 
it  is  highly  active  at  t  =  I  and  only  weakly  active  at  t  =  2),  and 
process  tCI  (e.g.,  representing  endogenous  activity  or  machine 
artifact)  is  more  active  at  t  =  2.  We  further  assume  that  the 
distributions  of  voxel  values  of  the  two  components  specifying  the 
locations  of  the  processes  are  independent  of  one  another,  b:  The 
voxels  with  the  largest  map  values  in  the  two  hypothetical 
component  distributions  are  active  voxels  of  the  components,  c: 
The  simplest  reference  function  useful  for  detecting  task-related 
activations  using  correlation  analysis  is  active  (=1)  during  the 
experimental  task  and  silent  (=0)  during  the  control  task. 


Component  ICl  (representing  either  endogenous  activ¬ 
ity'  or  machine  artifact)  is  somewhat  more  active  at  t  = 
I  than  at  t  =  2.  VV%  assume  that  the  distributions  of 
voxel  values  for  the  hx'o  components  are  independent 
of  one  another,  with  fairly  small  and  discrete  sets  of 
active  voxels  (such  as  those  indicated  for  the  cartoon 
head  in  Fig.  4b).  Here,  a  simple  reference  function  for 
detecting  task-related  brain  areas  via  correlation  (Fig. 
4c)  will  have  the  values  1  (=  "on")  at  t  =  1  and  0 
(=  "off")  at  t  =  2. 


Figure  5  (top)  show’s  a  scatter  plot  of  the  hypotheti¬ 
cal  MRl  data.  Here,  for  each  voxel  the  signal  value 
recorded  at  time  t  =  1  is  plotted  against  its  value  at  t  = 
2,  The  relative  activations  of  components  ICl  and  IC2 
(Fig,  4)  appear  in  Figure  5  as  fixed  vector  directions. 
The  assumption  that  component  processes  ICl  and  IC2 
are  spatially  independent  implies  that  the  data  points 
wTil  tend  to  fill  along  each  of  the  vectors  labeled  ICl 
and  IC2.  Note  that  the  distribution  of  data  values  at  t  = 
1  is  correlated  with  the  data  distribution  at  t  =  2.  Thus, 
the  marginal  probability  distributions  of  the  data  in  i  ts 
current  form  are  not  uniform,  and  the  data  distribution 
cannot  have  maximum  entropy. 

Applying  a  suitable  linear  transformation,  W,  to  the 
data  transforms  it  into  the  rectangular  data  distribu¬ 
tion  showm  in  Figure  5b.  Under  W,  the  ICl  and  IC2 
vectors  in  the  upper  plot  are  mapped  into  the  orthogo¬ 
nal  basis  axes  ICl'  and  IC2',  which  "unmix"  the 
contributions  of  processes  ICl  and  IC2  to  the  data.  The 
assumed  spatial  independence  of  ICl  and  1C2  means 
that  the  transformed  data  are  then  rectangular,  and 
thus  have  higher  entropy  than  the  original  data.  If  each 
component  map  is  sparse,  with  relatively  few’  large 
values,  passing  the  transformed  data  through  a  sigmoi¬ 
dal  nonlinearity  g()  (Fig.  5c)  wTU  more  evenly  spread 
out  the  data  within  the  rectangle,  producing  a  data 
distribution  g(WX)  that  has  still  larger  entropy.  The 
ICA  algorithm  attempts  to  find  directions  ICl  and  1C2 
by  iteratively  adjusting  W  so  as  to  maximize  the 
entropy  of  the  resulting  transformed  distribution  g(WX) 
(Fig,  5c).  Note  also  that  the  linear  transform,  W,  is  in 
general  unique  only  up  to  scaling  and  permutation 
(e.g,,  W,  might  switch  the  orders  of  ICl  and  IC2). 

.Active  voxels  in  the  ICl  and  1C2  component  maps 
(e.g.,  voxels  that  would  be  indicated  in  a  head  image 
such  as  Fig.  4b)  are  those  that  project  most  strongly  on 
vectors  ICl'  and  IC2'  (e.g.,  the  voxels  inside  the  solid 
parallelograms  in  Fig.  5a).  .Active  voxels  according  to 
correlation  of  the  data  with  the  assumed  reference 
function  are  those  w’hose  projections  onto  the  reference 
function  direction  (COR)  exceed  some  threshold  (e.g., 
those  inside  the  dashed  rectangle  in  Fig.  5a).  Unlike 
ICA,  principal  component  analysis  (PCA)  finds  or¬ 
thogonal  directions  of  maximum  variance  in  the  data. 
The  eigen\'ector  associated  with  the  first  principal 
component  points  in  the  direction  of  maximum  %'ari- 
ance  of  the  data  (PCI  in.  Fig.  5a).  In  general,  this  has  no 
specific  relationship  to  the  directions  (i.e.,  time  courses) 
of  the  independent  components.  Active  voxels  in  the 
PCI  direction  are,  e.g,,  those  inside  the  tilted  dotted 
rectangle  in  Figure  5a.  The  second  principal  compo¬ 
nent  of  these  data  (PC2)  is  by  definition  orthogonal  to 
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Analysis  of  simulated  experiment,  a:  A  scatter  plot  of  the 
hypothesized  fMRI  signal  values  {at  times  t  =  I  and  t  =  2)  for  each 
brain  voxel  contains  arrows  1C  I  and  IC2,  which  show  the 
directions  determined  by  the  relative  activations  of  the  two 
component  processes  (Fig.  4).  The  assumption  of  spatial  indepen¬ 
dence  of  the  1C  I  and  IC2  maps  implies  that  the  data  will  vary 
independently  along  these  two  component  vectors.  The  two 
parallelograms  (solid  borders)  indicate  the  active  voxels  for  each 
component  (e.g.,  the  voxels  highlighted  in  Fig.  4b).  Active  voxels  by 
correlation  analysis  are  those  whose  projections  onto  the  refer¬ 
ence  function  (COR)  exceed  an  arbitrary  correlation  threshold, 
e.g.,  those  enclosed  by  the  rectangle  (dashed  borders).  The  first 
principal  component  of  the  data  set  (PC  1 )  points  in  the  direction  of 
maximum  variance  of  the  data,  but  has  no  direct  relationship  to  the 
two  independent  component  directions  (iCI  and  IC2).  Active 
voxels  associated  with  the  first  principal  component  are  those  lying 
inside  the  tilted  rectangle  (dotted  borders).  ICA.  PCA,  and 
correlation  analyses  find  overlapping,  but  typically  not  identical, 
collections  of  active  voxels.  Only  ICA  will  find  the  active  areas  of 
each  Independent  component  (Fig.  4b).  bt  The  independent 
component  directions  ICI  and  IC2  can  be  indirectly  determined  by 
finding  the  linear  transform  W,  which  results  in  a  rectangular 
distribution,  c:  The  sigmoid  transformation  g(WX)  produces  the 
most  uniform  (i.e.,  maximum  entropy)  distribution  for  the  data 
shown.  The  ICA  algorithm  of  Bell  and  Sejnowski  [1995]  adjusts 
ICI '  and  IC2'  to  maximize  the  entropy  of  the  distribution. 


the  first  component,  but  also  has  no  particular  relation¬ 
ship  to  either  of  the  independent  components. 

Figure  5a  shows  that  there  may  be  overlap  in  the 
collections  of  active  voxels  determined  by  correlation 
analysis,  PCA,  and  ICA,  but  these  three  methods  for 
finding  voxels  activated  during  an  experimental  task 
usually  will  not  give  identical  results.  To  the  extent  that 
assumptions  of  linear  summation,  spatial  sparsity,  and 
statistical  independence  between  components  are  valid, 
ICA  should  more  accurately  determine  the  exact  spa¬ 
tial  extents  and  time  courses  of  task-related  as  well  as 
nontask-related  activations  contributing  to  the  data. 

METHODS 

Subjects  and  image  acquisition 

A  total  of  4  normal  volunteer  subjects  participated  in 
tw^o  flVIRI,  experiments.  In  the  first,  3  subjects  per¬ 
formed  a  Stroop  color-naming  task.  In  the  second,  a 
fourth  subject  performed  a  word/number  task.  Each 
experiment  consisted  of  two  6-min  trials  of  the  same 
task,  irtterspersed  by  trials  involving  other  cogmtive 
tasks,  not  reported  here.  Each  trial  consisted  of  five 
40-sec  control  blocks  alternating  with  four  40-sec 
experimental  task  blocks. 

Subjects'  BOLD  signal  brain  activity  was  scanned 
using  a  1,5T  GE  Sigma  MRI  system  GE  Medical 
Systems  (Waukesha,  WI)  equipped  with  an  inserted 
three-axis  balanced  torque  head  gradient  coil  designed 
for  rapid  switching  [Wong  et  al.,  1992].  A  midsagittel 
localizer  slice  assisted  in  determining  landmarks  for 
8-10  (5  mm  thick,  1  mm  interslice  gap)  64  x  64 
echopianar,  gradient-recalled  (TR  =  2500  msec,  TE  ==  40 
msec)  axial  images  with  a  24-cm  field  of  view'.  For  each 
slice,  133-146  images  were  collected  at  2.5-sec  sam¬ 
pling  intervals.  Slices  were  selected  to  include  the 
anterior  cingulate  gyrus,  implicated  by  PET  studies  in 
Stroop  performance  (Bench  et  al.,  1993],  and  portions 
of  the  parietal,  occipital,  and  temporal  lobes.  High- 
definition  anatomical  images  were  also  acquired,  us¬ 
ing  a  spoiled  GRASS  protocol  to  define  the  localization 
of  the  BOLD  signal  changes  with  respect  to  brain 
anatomy. 

Tasks 

Stroop  task 

The  Stroop  colqr-naming  task  is  often  used  to  exam- 
me  disinhibition  and  selective  attention  deficits  in 
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ne  smoothing.  Illustration  of  the  technique  used  for  temporal 
oothing  and  time  alignment.  A  three-point  Hanning  smoothing 
er  was  convolved  with  the  data,  using  slightly  different  lags  for 
;h  brain  slice  to  minimize  offsets  introduced  by  the  successive 
D-msec  sampling  delays  in  the  multislice  acquisition  process. 

tients  with  brain  disorders  [Lezak,  1995].  Stimuli 
arming  a  visual  angle  of  2®  by  3°  were  presented  one 
a  time  by  overhead  projector  onto  a  screen  placed  at 
!  foot  of  the  magnet.  Subjects  viewed  this  screen 
ough  a  mirror  attached  to  the  head  coil.  A  personal 
nputer  containing  a  Cognitive  Testing  System  (Digi- 
,  Inc.,  Edgecomb,  MA)  controlled  stimulus  presenta- 
n.  Stimuli  were  presented  as  near  as  possible  to  the 
iter  of  the  subject's  visual  field.  In  all  conditions, 
ijects  were  instructed  to  covertly  name  the  color  of 
h  stimulus,  which  was  red,  green,  or  blue.  In  control 
cks,  the  subjects  were  simply  required  to  covertly 
ne  the  color  of  a  displayed  rectangle.  During  experi- 
ntal  Stroop-task  blocks,  subjects  were  required  to 
ne  the  color  of  the  script  used  to  print  one  of  the 
ne  color  names  (i.e,,  "red,"  "green,"  or  "blue").  Each 
or  name  was  displayed  in  a  different  color  from  the 
?  it  was  named.  For  example,  when  the  word  "red" 
s  presented  in  blue  script,  the  subject  was  to  think 
i  not  speak)  the  word  "blue."  Each  trial  comprised 
r  task  cy^cles,  each  consisting  of  a  40-sec  control 
ck  and  a  40-sec  experimental  block,  followed  by  a 
il  40-sec  control  block.  The  first  6-min  trial  was 
eated  about  15  min  after  its  initial  presentation  (i.e., 
■r  two  similar  interv'ening  trials).  Each  subject  was 
tested  during  a  training  session  to  determine  the 
‘r-item  interx^al  for  which  they  would  make  verbal 
)rs  on  10-20%  of  the  presented  items.  This  interval 
>  then  used  in  the  experiment. 

rdlnumber  task 

he  Brown  and  Peterson  word /number  task  [Peter- 
and  Peterson,  1959]  has  been  used  in  experimental 


psychology'  and  neuropsychology  to  investigate  word- 
forgetting  over  brief  interv'als  [Lezak,  1995].  Stimuli 
spanning  a  visual  angle  of  1°  by  2°  were  presented  via 
tire  same  apparatus  as  described  for  the  Stroop  task 
trials.  During  control  blocks,  the  subject  simply  fixated 
on  an  asterisk  displayed  in  the  screen  center.  In  the 
word /  number  task  blocks,  the  subject  passively  ob- 
serx'ed  a  word  that  was  displayed  for  2  sec.  During  the 
following  6  sec,  an  integer  betw'een  100-900  was 
shown  on  the  screen,  and  the  subject  was  to  mentally 
add  successive  7s  to  il  while  stiU  remembering  the 
w'ord.  For  example,  if  the  number  displayed  were  300, 
the  subject  was  to  think  covertly,  "307, 314, 321. . The 
subject  was  not  asked  to  explicitly'  recall  the  presented 
word.  Each  40-sec  task  block  contained  five  word/ 
number  stimulus  pairs. 

Preprocessing 

A  set  of  8-10  slices  was  collected  in  cyclic  order 
every  2.5  sec.  This  rate  is  faster  than  the  time  constant 


Figure  7. 

Relative  contributions  of  iCA  and  PCA  components.  The  eight 
upper  traces  show  the  fractional  contributions  to  the  observed 
data  of  the  144  ICA  components  for  each  of  the  eight  trials, 
rank-ordered  by  contribution  size.  The  rank  of  the  consistently 
task-related  ICA  component  in  each  trial  Is  indicated.  These 
distributions  of  relative  component  contributions  were  highly 
similar  across  trials,  and  differed  from  the  distribution  of  rank- 
ordered  contributions  of  the  PCA  components  to  the  data  from 
one  of  the  trials  {bottom  trace).  As  expected,  PCA  accounted  for 
much  of  the  data  variance  by  a  few  large  components,  whereas  the 
relative  contributions  of  the  ICA  components,  specifying  the 
spatially  independent  components  comprising  the  signal,  were 
more  equal. 
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(b)  Comparison  of  Three  Linear  Models 
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Figure  8. 

Consistently  task-related  (CTR)  components,  (a)  Results  for  the  3  subjects  performing  the  Stroop  color-naming  task.  Each  subject 
participated  in  two  6-min  trials  composed  of  alternating  40-sec  control  and  Stroop  task  blocks.  ICA  decomposition  of  each  trial  produced 
one  component  whose  time  course  of  activation  strongly  resembled  the  task  block  structure  (r  =  0.64-0.94).  Active  voxels  iCA 
component  map  voxels  (iz!  >  2.(3)  are  shown  in  red,  together  with  voxels  considered  active  by  correlation  in  blue  (r  >  0.4)  for  one  brain 
slice.  Voxels  deemed  active  by  both  methods  are  shown  in  yellow.  Dorsolateral  frontotemporal  activations  were  detected  only  by  ICA. 
(b)  Comparison  of  CTR  component  maps  for  PCA,  the  fourth-order  technique  of  Comon  [1994],  and  ICA.  The  most  consistently 
task-related  component  maps  are  shown  for  one  of  the  Stroop  sessions  {subject  2,  trial  I )  (cf.  Fig.  9),  Axial  slices  reveal  more  focal  regions 
of  activity  by  ICA  and  the  fourth-order  decomposition  of  Comon  [1994],  while  the  PCA  map  is  more  speckled  or  diffuse,  and  does  not 
reveal  the  extensive  occipital  activations  shown  by  the  other  decompositions  as  well  as  by  correlation  (a).  Red  voxels  are  activated  with 
the  shown  time  course,  white  blue  voxels  are  activated  opposite  to  the  shown  time  course. 


of  the  BOLD  signal  hemodynamic  response  function  almost  exclusively  outside  the  brain  and  were  there- 
that  typically  peaks  5-8  sec  after  stimulus  onset  [Ban-  fore  excluded  from  analysis. 

dettini  et  al.,  19921.  The  data  were  not  registered  to  As  shown  in  Figure  6,  data  were  temporally 
correct  for  head  movement.  Voxels  indexing  active  smoothed  using  a  3-point  filter  based  on  a  Hanning 
brain  regions  were  determined  by  examining  the  mean  window  [Press  et  ai.,  1992].  The  three  points  were 
value  of  the  time  series  of  each  voxel.  These  mean  shifted  along  the  window  by  250  msec  for  each 
voxel  values  invariably  followed  a  bimodal  probability  successive  slice,  to  decrease  the  time  misalignments 
distribution.  The  local  minimum  beriveen  the  two  induced  by  the  250-msec  acquisition  delays  between 
peaks  of  a  third-order  polynomial  fitted  to  the  voxel  slices.  The  filtered  BOLD  signals  from  ail  brain  voxels 
mean-value  histogram  determined  a  cutoff  value  above  at  each  time  point  were  placed  into  subsequent  rows  of 
which  voxels  were  assumed  to  contain  active  brain  thedatamatrix.  The  nieanofeach  row  (time  point)  was 
signal.  Voxels  with  weaker  signals  were  foimd  to  lie  then  subtracted  from  the  data. 


PCA  (2nd  Order)  Comon  (4th  Order)  ICA  (all  Orders) 


Subj/Trial 


r=0.52 

r=0.60 

yVvvvv 

f=0.42 

r=0.53 

„  r=0.86 

v\rVvA 

MM 

r=0.50 

r=0.88 

r=0.94 

yAA/i 

jwv\ 

r=0.47 

r=0.76 

r=0.72 

VWiA 

Uv\aA 

r=0.55 

r=0.82 

r=0.84 

r=0.63 

r=0.68 

r=0.91 

/wV^ 

A^AiV 

“  JlfWl 

jmiui 

jirum 

Mean:  r=0.52 

r=0.72 

Figure  9. 

r=0.82 

Component  independence.  Comparisc«s  of  three  linear  decompo-  the  imposed  criterion  for  spatial  independence  between  maps 
sition  techniques,  PCA,  the  fourth-order  algorithm  of  Comon  becomes  more  strict,  from  PCA  (second  order),  to  the  technique 
[1994],  and  the  (CA  algorithm  of  Bell  and  Sejnowski  [i99S},  For  of  Comon  (fourth  order),  to  the  current  iCA  method  (ail  orders), 
each  of  the  three  techniques,  the  component  time  course  most  there  is  stronger  agreement  between  the  CTR-component  time 
closely  matching  the  Stroop  task  reference  function  is  shown.  As  course  and  the  reference  function. 


*  McKeown  et  aL  « 


The  ICA  algorithm  was  applied  separately  to  data 
from  two  6-min  trials  for  each  subject.  Analysis  Wcis 
performed  using  a  matrix  code  implemented  in  MAT- 
LAB  4.2  (Mathw^orks,  Inc.).  Convergence  of  the  ICA 
analysis  for  each  6-min  trial  session  typically  took  90 
min  on  a  Digital  Equipment  Corporation  Alpha  2100A 
computational  server. 

Chice  VV  had  been  determined  by  the  algorithm, 
component  maps  C  were  derived  using  Equation  (4), 
The  time  course  of  activation  of  each  component  was 
contained  in  the  corresponding  column  of  W“‘.  For 
comparison,  the  eigenimages  from  each  trial  w'ere 
determined  using  standard  PCA  techmques  [Jackson, 
1991],  along  with  their  associated  time  courses.  To 
determine  the  effects  of  higher-order  statistics  on 
determining  uncorrelated  spatial  maps,  an  ICA  tech¬ 
nique  using  fourth-order  cumulants  proposed  by  Co- 
mon  [1994]  w-as  also  used  to  find  partially  independent 
maps  and  associated  time  courses.  Convergence  of  the 
Comon  algorithm  tv'pically  took  360  min  of  computer 
time  per  trial. 

For  each  trial,  a  reference  function  was  constructed 
by  convolving  a  square  wave  matching  the  time 
course  of  the  experimental/control  task  blocks  with  a 
crude  approximation  of  the  BOLD  impulse  response 
function,  a  rectangular  function  of  7.5-sec  duration 
(Fig.  1).  This  reference  function  was  then  corre¬ 
lated  with  the  signal  time  course  of  each  voxel 
[Bandettini  et  aL,  1993]  and  with  the  time 
courses  of  the  maps  derived  by  the  ICA  and  PCA 
techniques 
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where  ri,  is  the  correlation  coefficient  for  the  k*  voxel, 
Xii<  is  the  recorded  value  of  the  k*  voxel  at  the  i*  time 
point,  and  y;  is  the  reference  function  at  the  i*  time 
point. 

A  cutoff  value  of  r;.  -  0.4  was  used  as  the  correlation 
significance  threshold.  Voxels  whose  n,;  exceeded  this 
limit  were  considered  active  voxels  by  the  correlation 
method.  ICA  revealed  that  one  trial  from  one  subject 
contained  a  prominent  linear  trend.  Therefore,  the  data 
from  this  trial  wrere  linearly  detrended  before  correla¬ 
tion  analysis  for  fair  comparison  wTth  the  ICA  results. 


Here, 

X;,  =  X,,  -  [m|i  -T  bj]  (18! 

where  X,j  is  the  recorded  value  of  the  voxel  at  the  i* 
time  point,  xj  is  the  time  series  of  the  j'-'"  voxel  after 
detrending,  and  m,  and  b;  are  defined  by 
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and  n  is  the  number  of  time  points  in  the  trial. 

For  each  trial,  the  computed  ICA  component  and 
correlation  active  voxels  were  read  into  the  functional 
neuroimaging  display  program  MCW  AFNI  [Cox, 
1996]  for  display  and  registration  with  the  structural 
Tl-weighted  MKl  brain  images. 

RESULTS 

ICA,  PCA,  the  fourth-order  method,  and  correlation 
analysis  were  applied  to  the  eight  data  sets  for  the 
Stroop  and  name/ word  task  outlined  in  Methods.  The 


Figure  10, 

Consistent  activation  across  trials  during  the  Stroop  task.  The 
CTR  component  maps  from  both  trials  from  each  subject  were 
spatially  smoothed  with  a  3-D,  6-mm.  full-width  half-maxi¬ 
mum  Gaussian  kernel  and  averaged  over  trials.  The  scatter 
plots  (at  right)  plot  the  smoothed  voxel  z-vaiues  from  the 
CTR  map  in  trial  2  (axis:  -S  <  z  <  5)  vs.  the  smoothed  z-values 
in  the  CTR  map  obtained  from  trial  I  (axis;  -5  <  z  <  5),  along 
with  the  correlation  of  each  voxel  with  the  reference  function  in 
trial  2  (axis:  -0,5  <  r  <  0.5)  vs.  the  correlation  values  in  trial  1 
(axis:  -0-5  <  r  <  0.5).  The  oblique  lines  in  the  scatter  plots 
correspond  to  the  z-value  thresholds  labeled  in  the  ICA  CTR 
component  maps.  Correlational  thresholds  were  those  giving 
an  equal  number  of  active  voxels  as  the  number  of  active  voxels  in 
the  ICA  CTR  component  at  the  various  z-thresholds.  Note  that 
frontal  activity  in  the  second  subject  was  detected  only  fay  ICA 
(middle). 
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Figure  i  O. 


contributions,  7,,  of  each  ICA  component  to  the  data 
were  computed  as  in  Equation  (6),  These  ranged  from 
0.08—5  X  10“'’.  The  distributions  of  the  ICA  component 
contributions  were  similar  across  trials  and  were  quite 
unlike  the  distribution  of  contributions  of  projections 
on  the  principal  components  of  the  same  data  (Fig.  7). 
Some  maps  contained  multifocal  groupings  of  active 
voxels,  while  others  (usually  those  with  contribu¬ 
tions  <  0.01)  had  diffuse  or  "speckled"  spatial  distribu¬ 
tions.  The  time  courses  of  the  components  could  be 
grouped  into  broad  classes.  The  time  courses  of  some 
components  followed  part  of  or  the  entire  task  block 
design  (CECECE. . .),  while  others  were  slowly  vary¬ 
ing,  quasiperiodic,  or  noisy  in  appearance. 

Consistently  task-related  components 

In  ail  trials,  exactly  one  ICA  component  had  a  time 
course  that  was  highly  correlated  (r  =  0.64-0.94)  with 
the  reference  function.  In  ail  cases,  this  consistently 
task-related  (CTR)  component  had  a  relatively  low 
contribution  rank  (14th-41st).  Maps  of  active  voxels 
for  these  task-related  components  (using  a  'zj  >  2 
threshold)  contained  areas  of  activation  resembling 
those  produced  by  the  correlation  method  with  r  >  0.4. 
During  Stroop  trials,  ICA  and  the  correlation  method 
detected  task-related  activation  in  Brodmann's  areas 
18  and  19  (not  involving  the  calcarine  fissure)  and  in 
the  supplementary  motor  area  and  cingulate  system. 
In  each  of  the  trials,  the  ICA  method  also  detected 
task-related  activation  in  prefrontal  areas,  including 
the  left  dorsolateral  prefrontal  cortex  (Figs.  8,  10,  14, 
17).  The  first  subject  performing  the  Stroop  task  was 
later  found  to  have  been  vocalizing  the  words  during 
the  first  trial  rather  than  stating  them  covertly,  prob¬ 
ably  introducing  additional  motion  artifact  in  the  data. 

Figure  9  compares  the  time  courses  of  the  compo¬ 
nents  best  matching  the  reference  function  for  each  of 
the  three  linear  fnodels  used  in  the  Stroop  task:  PCA, 
the  fourth-order  method,  and  the  ICA  algorithm  de¬ 
scribed.  Several  PCA  component  maps  (Fig.  8b)  had 
associated  time  courses  that  were  moderately  corre¬ 
lated  with  the  reference  function,  although  these  corre¬ 
lations  were  lower  than  for  the  CTR  ICA  components. 
In  general,  as  successively  stricter  criteria  for  spatial 
independence  between  individual  maps  were  applied, 
the  time  courses  of  the  maps  more  closely  matched  the 
reference  function  (Fig.  9). 

In  order  to  detect  areas  of  activation  consistent 
across  trials,  the  CTR  and  correlation  maps  from  each 
of  the  two  Stroop  trials  from  each  subject  were  aver¬ 


aged  after  spatially  smoothing  each  map  with  a  ttiree- 
dimensional  (3-D),  6-tnm,  full-width-half-maximum 
Gaussian  kernel.  As  showm  in  Figure  10,  the  frontal 
activation  detected  in  all  3  subjects  by  ICA  was  robust 
to  changes  in  the  z-threshold  for  activation.  Reducing 
the  threshoid  added  active  areas  adjacent  to  regions  of 
activation  found  with  higher  thresholds.  Only  the  ICA 
method  detected  frontal  activation  in  the  .second  sub¬ 
ject,  even  after  a  significant  reduction  of  the  correla¬ 
tional  threshold  (r  =  0.16).  Frontal  activation  was  de¬ 
tectable  by  correlation  in  the  third  subject,  but  only  by 
reducing  the  threshold  (r  =  0.23)  until  apparent  activa¬ 
tion  in  the  basal  ganglia,  thalamus,  and  lateral  ven¬ 
tricles  was  observed. 

In  Figure  11,  the  four  80-sec  task  cycles  of  the  CTR 
ICA  component  in  each  of  the  Stroop  trials  are  superim¬ 
posed.  In  each  trial,  the  fine  temporal  structure  of  the 
activation  was  stereotc^ped  within  subjects.  The  right 
side  of  Figure  11  shows  the  mean  of  the  eight  ICA 
component  task  activations  in  the  tw'o  trials  from  each 
subject,  superimposed  on  the  expected  response  (one 
cycle  of  the  task  reference  function).  Note  that  the 
mean  time  courses  for  each  subject  (Fig.  11,  right 
column)  were  not  reliably  estimated  by  the  reference 
function,  suggesting  that  the  true  extent  of  hemody¬ 
namic  activation  during  Stroop  task  performance  was 
not  constant  but  tended  to  decline  during  the  course  of 
the  experimental  blocks.  All  3  subjects  showed  greater 
activation  near  the  beginning  of  the  trial.  Subjects  also 
differed  in  the  rise-time  of  activation.  These  details 
tended  to  be  consistent  across  task  cycles.  Further,  the 
time  course  given  by  ICA  much  more  closely  re¬ 
sembled  the  mean  time  courses  of  the  most  active 
voxels,  as  determined  by  either  ICA  or  correlation, 
than  did  the  idealized  reference  function. 

For  the  subject  performing  the  word /number  task, 
areas  of  significant  activation  by  ICA  and  correlation 
were  again  similar  (Fig,  12).  Both  methods  found 
task-related  activation  in  Brodmann's  areas  18  and  19 
and  in  left  occipital-parietal  areas.  ICA  also  indicated 
significant  activation  in  frontal  and  temporal  regions. 

Figure  13  shows  a  scatter  plot  displaying  each 
voxel's  value  in  the  consistently  task-related  map  vs. 
its  correlation  with  the  reference  function  for  one 
Stroop  trial  (subject  2,  trial  1).  Both  methods  found  47 
active  voxels  in  conunon  (Fig.  13,  upper  right).  ICA 
also  found  175  voxels  whose  correlation  with  the 
reference  function  was  <0.4  (including  some  whose 
correlation  with  the  reference  function  w'as  near  zero). 
However,  the  mean  time  course  of  these  175  voxels 
(Fig.  13a,  upper  middle)  dearly  reflected  the  alternat¬ 
ing  task-block  sequence,  supporting  the  implication  of 


*  Independent  Component  Analysis  of  fMRI  Data 


Time  (s) 

Figure  I  I. 

Consistency  of  ICA  in  task-reiated  activations.  At  left  and  center  function  used  in  the  correlation  analysis  is  superimposed  on  the 

are  superimposed  the  four  successive  80-sec  task  cycles  of  the  component  means  for  comparison.  Note  the  stereotyped  details  of 

consistently  task-related  component  activations  from  each  of  the  the  experimental  task  activations  in  at  least  four  of  the  trials,  and 

six  Stroop  trials.  Right:  The  means  of  the  eight  task-cycle  activa-  the  individual  subject  differences  between  the  mean  activations  and 

tions  for  each  of  the  3  subjects.  A  single  cycle  of  the  reference  the  assumed  task  reference  function. 


the  ICA  results  that  activity-  at  these  s'oxels  was 
influenced  bv  task  performance. 

Since  the  data  for  this  trial  contained  a  prominent 
linear  trend,  linearly  detrending  the  data  before  corre¬ 
lating  with  the  reference  function  (Fig.  13b)  increased 
the  overlap  between  the  results  of  the  tw^o  techniques 
(from  47  voxels  in  common  to  105).  However,  there 
Were  roughly  as  many  \'oxels  that  each  method  de¬ 
tected  individually  (117.  113)  that  the  other  did  not. 


After  linear  detrending  prior  to  correlation,  frontal 
activation  was  detected  by  both  methods  (Fig.  14),  The 
failure  of  ICA  to  detect  as  significant  some  of  the  active 
voxels  detected  by  correlation  might  be  explained  b}^ 
the  participation  of  these  voxels  in  other  !C.A  compo¬ 
nents  transiently  time-locked  to  the  task  block  design 
(see  below)  or  by  inaccuracy  in  selecting  equiy-alent 
ICA  and  correlation  thresholds  (here  z  =  2.0,  r  -  0.4). 
The  mean  time  course  of  the  105  active  voxels  by  both 
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Figure  12. 

Word/number  task  activations,  Consistently  task-related  component  activations  for  the  subject  performing  the  word/number  task  in  two 
trials  Again,  the  ICA  decomposition  included  a  single  component  whose  time  course  was  highly  correlated  with  the  task  block  structure 
of  the  trial.  This  component  had  more  active  voxels  {x  >  2.0)  in  posterior  visual  association  areas  than  were  found  by  correlation  with  the 
reference  function  (r  >  0.4).  ICA  also  found  active  frontal  and  lateral  regions  not  detected  by  correlation  analysis. 
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Figure  13. 

ICA  vs.  correlation,  at  A  scatter  plot  comparing  voxel  values  in  the 
CTR  component  map  {subject  2,  trial  I )  to  correlations  of  the 
signal  at  each  voxel  with  the  task  reference  function.  Horizontal 
lines  {z  =  Z:2)  separate  the  voxels  into  active  and  inactive  subsets 
according  to  tCA,  while  vertical  lines  (r  —  ±0.4)  indicate  the 
threshold  for  active  voxels  used  in  the  correlation  analysis.  The 
numbers  of  voxels  falling  in  the  resulting  portions  of  the  plot  are 
noted.  The  plotted  waveforms  represent  the  mean  time  courses  of 
die  voxels  in  each  portion.  Forty-seven  voxels  were  considered 
active  by  both  analytical  methods  (upper  right).  The  ICA  method 
selected  1 75  voxels  as  active  that  were  considered  inactive  by 
correlation  {upper  center).  The  mean  time  course  of  these  voxels 
clearly  showed  task-related  activation  (upper  center  trace).  Corre¬ 
lation  marked  20  voxels  as  active  whose  iCA  map  values  were 
considered  inactive  (right  center),  b:  When  the  linear  trend  was 
removed  from  the  time  course  of  each  voxel  before  correlating 
with  the  reference  function,  the  number  of  voxels  considered 
active  by  both  methods  increased  to  105  (top  right),  and  the 
numbers  of  voxels  considered  active  by  one  method  only  ( !  1 7/ 

1 1 3)  were  equalized.  Note  that  the  mean  time  course  of  the  1 05 
voxels  detected  by  both  methods  (solid  line,  upper  right)  was 
l^ighly  similar  to  the  detrended  CTR  component  time  course 
(broken  line,  upper  right). 


methods  (solid  trace,  upper  right)  closely  resembled 
the  CTR  component  time  course  (broken  trace). 

The  unique  consistently  task-related  component  in 
each  trial  had  a  multifocal  character,  as  shown  in 
Figures  8a,  12.  The  other  140  or  more  components  for 
each  trial  could  be  grouped  empirically  into  several 
broad  classes,  described  below  according  to  general 
features  of  their  spatiotemporal  structure. 

Transiently  task-related  components 

Some  components  appeared  to  be  time-locked  to  the 
task-block  design  during  part  of  the  trial.  For  example, 
the  acti\'e  areas  for  the  component  shown  in  Figure  15a 
included  frontal  and  occipital  regions.  This  component 
was  abruptly  activated  during  the  second  Stroop  task 
block  but  not  during  other  task  blocks.  Such  tran¬ 
siently  task-related  (TTR)  activity  might  not  be  de¬ 
tected  by  a  correlational  analysis  that  averaged  over  all 
the  task-block  cycles  in  a  trial. 

Slowly  varying  components 

In  most  trials,  there  were  also  slowly  varying  compo¬ 
nents  (Fig.  15b).  In  the  trial  showTi,  voxels  indexing  regions 
of  the  ventricular  system  were  separated  into  one  slowly- 
varying  component  (Fig.  15b,  solid  line),  implying  that 
part  of  the  BOLD  signals  at  these  voxels  changed  in 
symchrony  with  the  time  course  of  this  comporrent.  The 
dotted  line  in  Figure  15b  shows  the  mean  time  course  of 
the  active  voxels  for  this  component.  Note  that  the  time 
courses  of  the  components  showm,  although  monotonic, 
are  not  linear  and  therefore  could  not  be  removed,  entirely 
by  linearly  detrending  the  data. 

Quasiperiodic  components 

In  data  from  each  of  the  4  subjects,  several  compo¬ 
nents  had  approximately  oscillating  time  courses  with 
bimodal  periods  near  14  or  40  sec  (Fig.  15c).  These 
componeirts  showed  similar  areas  of  activation  in  both 
trials,  mostly  restricted  to  a  single  brain  slice. 

Movement-related  components 

Some  components  had  abrupt  changes  in  their  time 
course  and,/ or  ring-like  spatial  distributions,  suggest¬ 
ing  sudden  or  gradual  head  movements.  The  distribu¬ 
tion  of  positive  and  negative  voxel  values  for  the 
component  shown  in  Figure  15d  and  its  abruptly 
shifting  time  course  suggest  the  effect  of  a  torsional 
head  movement  in  the  coronal  plane.  Other  compo¬ 
nents  had  a  "ring-like"  spatial  structure,  like  those  shown 
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Figure  14. 

ICA  vs.  correlation  widt  linear  detrending.  The  spatial  distribution  of  voxels  detected  by  ICA  (red, 
z  S;  2)  and  by  correlation  (blue,  r  a  0.4)  after  detrending.  Same  data  set  as  Figure  1 3.  Note  the 
frontal  regions  of  activation  detected  by  both  methods  (circles). 


in  Figure  15e,  which  we  suspect  represented  motion  in  the 
axial  plane.  Head-movement  simulations  (reported  be¬ 
low)  tended  to  support  this  hypothesis. 

Residual  noise  components 

The  smallest  ICA  components  (especially  those  with 
contribution  rankings  of  90-144)  had  diffuse  or  "noisy 
spatial  and  temporal  patterns  and  most  probably  repre¬ 
sented  noise  in  the  data.  Their  time  courses  and  maps 
were  not  reproducible  across  applications  of  the  algo¬ 
rithm,  even  on  the  same  data.  The  noisy  character  of  one  of 
these  small  components  is  clear  in  Figure  15f,  which  shows 
the  random  distribution  of  active  voxels  in  tvv'o  axial  slices. 

Voxels  active  in  several  components 

Figtire  16  demonstrates  that-  a  single  voxel  could 
participate  significantly  in  several  ICA  components  of 


more  than  one  of  the  types  listed  above.  In  Figure  16, 
the  time  course  of  the  BOLD  signal  of  the  voxel 
highlighted  in  the  center  image  is  shown  beneath.  This 
voxel  was  highly  weighted  (z  =  5.0)  in  the  CTR  compo¬ 
nent  (Fig.  16,  middle  right),  but  was  also  active  in  three 
other  components  of  various  types.  The  ICA  method  is 
able  to  determine  that  a  voxel  is  an  active  participant  in 
a  CTR  component  even  though,  because  of  the  influ¬ 
ence  of  other  component  processes,  that  voxel's  time 
course  may  not  appear  to  be  task-related.  Calculations 
showed  that  most  voxels  were  active  (|z!  >  2)  in  1-6 
components,  and  that  on  average  each  voxel  was 
active  in  3.16  components. 

The  spatiotemporal  structure  of  task-related 
activations 

Figure  17  shows  four  consistently  or  transiently 
task-related  components  from  one  Stroop  trial  (subject 
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3,  trial  2).  The  time  courses  ot  the  TTR  components 
appear  time-locked  to  that  of  the  CTR  component  for 
part  of  the  trial.  By  summing  the  contributions  of  these 
components,  a  more  complete  and  dynamic  represen¬ 
tation  of  the  spatial  and  temporal  structure  of  task- 
related  activity  can  be  reconstructed. 

Tests  of  reliability  and  simulations 

To  further  our  understanding  of  the  reliability  of 
task-related  ICA  components,  the  data  set  from  one 
trial  (subject  2,  trial  1)  was  manipulated  in  different 
ways  and  the  resultant  data  sets  were  analyzed  again 
bv  ICA  to  determine  the  effects  of  the  manipulations 
on  the  computed  ICA  CTR  component. 

Robustness  against  added  noise 

The  standard  de%Tations  of  the  BOLD  signal  over  the 
course  of  the  selected  trial  w-ere  computed  for  each  of 
the  voxels  and  then  ordered  by  relative  size.  The  voxels 
with  the  smallest  signals  (those  in  the  low'est  quartile) 
had  quite  similar  standard  deviations.  TTeir  mean  was 
used  as  an  estimate  of  the  baseline  noise  level  in  the 
data.  Independent  zero-mean  Gaussian  noise  samples 
drawn  from  distributions  w’hose  standard  deviations 
were  various  percentages  of  the  baseline  noise  w'ere 
added  to  each  time  point  of  every  \''Oxel  in  the  trial. 
These  noise-added  data  sets  were  then  analyzed  as  if 
they  were  raw  data  to  determine  whether  a  CTR  ICA 
component  could  still  be  extracted.  Even  after  adding 
Gaussian  noise  at  a  level  equal  to  that  of  the  estimated 
noise  baseline,  a  CTR  component  was  recovered  (Fig. 
18).  .Although  the  exact  morphology'  of  the  square- 
wa\?e  time  course  varied  slightly  betw^een  the  various 
runs,  correlations  between  the  original  CTR  compo¬ 
nent  time  course  and  the  time  courses  of  the  CTR 
component  extracted  from  the  noise-added  data  were 
all  above  r  >  0.8. 

Component  reliability 

To  further  test  the  reliability  of  CTR  components,  the 
data  from  one  Stroop  trial  w^-ere  split  into  odd  and  even 
time  points,  and  each  of  the  two  72-time-point  data 
sets  were  decomposed  separately  using  ICA.  Figure  19 
shows  that  both  the  odd  and  even  decompositions 
returned  a  component  w'hose  time  course  and  map 
voxel  value  distribution  were  highly  related  to  those  of 
the  original  CTR  component. 

Detection  of  simulated  head  movement 

The  ability-'  of  the  IC.A  method  to  detect  abrupt  head 
movement  was  investigated  by  artificially  simulating  a 


head  shift  by  one  voxel  in  a  diagonal  direction  (4.2 
mm)  midway  through  one  trial.  The  largest  compo¬ 
nent  of  the  ICA  decomposition  of  the  resulting  data 
consisted  of  a  step  function  at  the  appropriate  time 
point  (Fig.  20a).  The  map  for  this  component  w'as 
concentrated  at  the  cortical  surface  with  opposite  signs 
in  the  right  frontal  and  left  occipital  areas,  i.e.,  the 
regions  of  maximum  signal  change  following  the 
simulated  movement.  Another  simulation  (not  showm) 
demonstrated  that  the  ICA  algorithm  could  also  be 
used  to  readily  detect  simulated  head  movements  of 
one  quarter  of  a  voxel  (—1  mm).  The  ICA  decomposi¬ 
tion  of  some  trials  included  components  whose  wave¬ 
forms  contained  sharp  transient  shifts  (Fig.  15d)  and/or 
W'hose  maps  had  a  similar  ring-like  structure  (Fig.  15e). 
We  tentatively  interpret  these  components  as  arising 
from  small  abrupt  or  gradual  head  movements  during 
the  trial. 

Detection  of  a  simulated  task-related  activation 

A  simulated  task-related  signal  w'ith  a  three-cycle 
time  course  and  a  spatial  distribution  unlike  that  of  the 
actual  CTR  component  was  added  to  (or  subtracted 
from)  the  signals  of  voxels  in  four  arbitrarily  selected 
regions  of  two  brain  slices  (Fig.  20b,  top).  The  iey'el  of 
simulated  activation  was  equal  to  that  of  the  CTR 
component  of  the  same  trial  (Fig.  20b,  upper  left).  ICA 
was  used  to  decompose  the  resulting  data  set.  The  time 
course  of  one  of  the  resulting  independent  components 
corresponded  to  the  simulated  three-cycle  activation, 
w'hile  a  second  component  accounted  for  the  actual 
four-cycle  task-related  activation  (Fig.  20b,  middle 
panety  Active  areas  (:z\  >  2)  of  the  simulated  three- 
cvcle  component  included  every  y'Oxel  in  the  simu¬ 
lated  actiy'e  regions,  plus  just  ty\'o  extraneous  y'oxels. 
Correlation  of  each  voxel  time  course  w'ith  the  simu¬ 
lated  three-cycle  reference  function,  on  the  other  hand, 
marked  as  active  (r  >  0.4)  only  a  small  proportion  of 
the  affected  voxels  (Fig.  20b,  middle  panel). 

To  test  w'hether  the  relative  insensitivity  of  correla¬ 
tional  analysis  in  this  instance  depended  upon  the 
selection  of  too  high  a  correlation  threshold,  w'e  re¬ 
duced  the  threshold  until  the  number  of  active  voxels 
detected  by  correlation  equaled  the  number  detected 
bv  ICA  (at  t:  >  0.197).  The  reduced  correlation  thresh¬ 
old  produced  the  active  voxel  map  shown  in  Figure 
20b,  bottom  panel.  In  this  map,  66%  of  the  active  voxels 
were  not  in  the  area  of  simulated  activation.  Thus,  ICA 
proved  both  more  sensitiy-e  and  more  specific  than 
correlation  in  detecting  the  areas  of  simulated  task- 
related  signal.  In  addition,  the  correlation  method 
required  that  the  experimenter  knew  the  time  course  of 
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the  simulated  task  activation,  whereas  ICA  determined 
and  separated  the  approximate  time  courses  of  both 
task-related  activations  without  any  a  priori  informa¬ 
tion  about  their  possible  time  courses. 

DISCUSSION 

Our  results  indicate  that  independent  component 
analysis  (ICA)  can  be  used  to  reliably  separate  tMRl 
data  sets  into  meaningful  constituent  components, 
including  consistently  and  transiently  task-related 
physiological  changes,  non  task-related  physiological 
phenomena,  and  machine  or  movement  artifacts.  For 
the  ICA  method  to  separate  task-related  activity'  from 
other  component  activitvv  the  spatial  distribution  of 
brain  areas  activated  by  task  performance  must  be 
spatially  independent  of  the  distributions  of  areas 
affected  by  artifact-  Confidence  in  this  assumption  is 
strengthened  by  the  result  that  the  time  courses  of  the 
CTR  components  in  six  Stroop  task  trials  more  dearly 
resembled  the  block  desigyi,  as  successively  stricter 
criteria  for  spatial  independence  were  applied  to  linear 
decompositions  of  the  data  (Figs.  8, 9).  The  time  course 
of  the  CTR  component  determined  by  PCA  did  not 
resemble  the  task  block  design  as  well  as  the  CTR 
components  from  the  fourth-order  or  higher-order  fCA 
algorithm  described  here.  The  algorithm  of  Bell  and 


Figure  15. 

Other  tCA  component  types.  Types  of  components  detected  by 
iCA  decomposition  (red,  z  £  2.0;  blue.  zS  --2.0).  Negative  z- 
values  mean  that  those  voxels  are  activated  opposite  to  the  plotted 
time  course,  (a)  Transiently  task-related  (TTR)  component  (sub¬ 
ject  I ,  trial  1 .  rank  33),  This  component  was  selectively  activated 
during  the  second  experimental  block.  The  dotted  line  shows  the 
time  course  of  the  consistently  task-related  component  for 
comparison,  (b)  Slowly  varying,  nontask-reiated  component  (sub¬ 
ject  I,  trial  i,  rank  12).  The  active  region  for  this  component  was 
the  ventricular  system.  The  lower  trace  (dotted  line)  shows  the 
mean  time  course  of  the  active  voxels  (z  >  2.0).  (c)  Quasiperiodic 
component.  This  component  (subject  I,  trial  4.  rank  40)  was 
mostly  active  in  a  single  slice  and  had  a  dominant  period  of  about  14 
sec.  Other  similar  components  were  active  in  other  slices.  The 
spatial  distributions  of  such  components  were  highly  reproducible 
between  trials,  (d)  Suspected  abrupt  head  movement  (subject  I , 
trial  4,  rank  1 8).  The  right-temporal  pattern  of  active  voxels  for  this 
component  could  be  produced  by  a  small,  abrupt  torsional  head 
hiovemeni.  (e)  Suspected  gradual  head  movement  (subject  ! ,  trial 
4,  rank  20).  The  lefc/right  ring-iike  pattern  of  active  voxels, 
together  with  the  monotonic  time  course,  suggests  a  slow  head 
shift  (f)  Residual,  "noisy”  component  (subject  I,  trial  4,  rank  69). 
Almost  all  the  s.mallest  ICA  components  were  of  this  type  and 
appear  to  represent  noise  in  the  data. 


Sejnowski  [1995]  is  computationally  more  efficient 
than  the  technique  of  Comon  [1994]  and  converges  to 
quite  similar  decompositions,  independent  of  the  ini¬ 
tial  weights  and  random  seed  used  in  the  training 
[Makeig  et  al,  1997].  Our  simulations  (Fig.  18)  indicate 
that  the  results  are  robust  in  the  presence  of  noise  in 
the  data.  Furthermore,  as  the  ICA  model  gives  a  linear 
decomposition  of  the  data,  its  results  are  easy  to 
manipulate.  Even  for  experiments  with  a  simple  re¬ 
peated  block  design,  the  IC.A  method  appears  to  be 
more  sensitive  in  detecting  task-related  activation  than 
correlating  with  an  idealized  reference  function.  Thus, 
we  detected  variable  frontal  activation  in  the  CTR 
component  of  all  3  subjects  performing  the  Stroop  task, 
which  was  in  some  cases  undetectable  by  correlation 
methods  (Fig.  8).  Several  transiently  task-related  com¬ 
ponents  also  demonstrated  dorsolateral  prefrontal  ac- 
tivih^  (Fig.  17),  which  would  be  unlikely  to  be  detected 
by  correlational  methods  because  their  time  courses 
could  not  be  known  in  advance.  Variable  frontal 
activation  during  sustained  or  repeated  task  perfor¬ 
mance  has  been  reported  in  several  previous  PET  and 
fMRl  studies,  and  may  relate  to  changes  in  subject 
visual-spatial  attention  [Nobre  et  al.,  1997],  language 
processing  [Binder,  1997],  changes  in  stimulus  novelty 
[Tulving  et  al.,  1996],  verbal  fluency  [Phelps  et  al, 
1997],  verbal  suppression  [Nathaniel-James  et  al.,  1997], 
and  working  memory'  [Manoach  et  al,  1997],  all  of 
w'hich  may  be  required  for  repeated  Stroop  task  perfor¬ 
mance  in  either  normal  or  impaired  subjects. 

ICA  also  produced  quasiperiodic  components  many 
wdth  periods  betvi'een  10-20  sec  (Fig.  15c).  As  it  is  not 
possible  to  bandpass-limit  BOLD  signals  prior  to  data 
collection,  periodic  signal  changes  faster  than  0.2 
sec, f cycle  (for  the  given  sampling  interval  of  2.5  sec) 
are  "aliased"  back  into  the  captured  signal,  and  may 
appear  in  any  frequency  range.  Quasiperiodic  fMRI 
signal  fluctuations,  therefore,  might  be  caused  by 
aliased  cardiac  (~l/sec)  and  respirator^'  (~]/4  sec) 
rhvthms  [Biswal  et  al,  1996;  Kwong,  1995;  Le  and  Hu, 
1996;  Mitra  et  al,  1997).  Much  slower  cerebrovascular 
waves,  presumed  to  be  due  to  autoregulatory  feedback 
of  cerebrovasculature  [Chichibu  et  al,  1995;  Wayen- 
herg  et  al,  1995],  may  also  be  a  potential  mechanism 
for  producing  fluctuating  BOLD  signal  changes,  Trans¬ 
mitted  pulsatile  movements  may  also  precipitate  a 
BOLD  signal  response  throughout  the  whole  brain  via 
induced  pressure  changes,  while  blood  flow  effects  are 
usually  local  to  the  great  vessels  [Kwong,  1995], 

The  single-slice  appearance  of  the  quasiperiodic 
components  in  all  trials  might  possibly  also  reflect  the 
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Figure  16. 

Summation  of  ICA  components.  Most  brain  voxels  were  active  (i.e.,  had  map  values  of  izj  >  2)  m  1-6 
ICA  components  (mean.  3.  i  9).  Here,  one  voxel  in  a  posterior  visual  association  area  participates 
strongly  in  the  CTR  component  for  this  trial  (z  =  S.O)  as  well  as  in  two  other  larger  (lower-rank) 
components  and  one  smaller  (higher-rank)  component. 


Spin-excitation  history  used  in  the  acquisition  of  a 
brain  slice.  Attempts  have  been  made  to  explicitly 
model  spin-excitation  history'  to  counteract  this  artifact 
[Friston  et  ai,  1996].  However,  as  ICA  reliably  and 
consistently  extracted  and  separated  quasiperiodic 
components  from  other  components,  the  ICA  tech¬ 
nique  might  circumvent  the  need  to  explicitly  model 
spin-excitation  history  for  this  purpose. 

The  ICA  method  assumes  that  the  observ'ed  fMRI 
data  are  the  linear  sum  of  components  with  unique 
(though  possibly  correlated)  time  courses  and  statisti¬ 
cally  independent  distributions  of  map  voxel  values. 
The  method  can  be  viewed  as  a  version  of  the  "general 
linear  model"  [Friston,  1996J  currenth'  used  in  func¬ 
tional  neuroimaging  and  given  by 

X  =  Gp  +  e 


where  X  is  a  data  matrix  with  elements  Xij  (the 
observation  of  the  j*  voxel  at  the  i*  time),  G  is  a 
"design  matrix"  specifying  the  time  courses  of  all  the 
factors  hypothesized  to  be  present  in  the  observed  data 
(e.g.,  the  task  reference  function,  or  a  linear  trend),  3  is 
a  matrix  of  map  voxel  values  for  each  hypothesized 
factor,  and  e  is  a  matrix  of  noise  or  residual  modeling 
errors.  Given  this  linear  model  and  a  design  matrix  G, 
the  maps  (J  can  be  found  by  least  squares  estimation.  In 
contrast,  the  ICA  method  extracts  intrinsic  spatially- 
independent  components  of  the  observed  data  and 
determines  explicitly  their  time  courses,  rather  than 
relying  on  a  priori  hypotheses  as  to  what  they  should 
be.  The  need  for  procedures  that  involve  splitting  an  a 
priori  design  matrix  G  into  parameters  of  interest  and 
of  no  interest,  in  an  attempt  to  increase  signai-to--noise 
ratio  JFriston,  1996],  might  thus  be  reduced  by  using 
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the  ICA  technique.  Our  results  demonstrate  that  ICA 
can  extract  both  transiently  and  consistently  task- 
related,  nontask-related,  and  artifactual  components 
without  a  priori  knowledge  of  their  temporal  or  spatial 
structure.  This  property  of  the  ICA  algorithm  w-arrants 
its  description  as  providing  'l^lind  separation"  of  the 
data  into  spatially  independent  components. 

Although  the  algorithm  is  capable  of  "blind  separa¬ 
tion"  into  independent  components,  the  subsequent 
interpretation  of  the  separated  components  requires 
additional  knowledge  on  the  part  of  the  experimenter. 
In  the  current  trials,  w-hich  utilized  a  task  block  design, 
the  one  component  for  each  trial  that  appeared  to  be 
consistently  task-related  was  easily  found  by  compar¬ 
ing  the  component  time  courses  to  the  task  reference 
hmction.  Although  we  did  not  know'  a  priori  the  exact 
hme  course  of  activation,  rough  knowdedge  of  the  task 
block  design  w'as  required  to  identify  the  appropriate 
component.  We  are  currently  investigating  heuristic 
approaches  for  objectiv'e  classification  of  the  separated 


components.  For  example,  the  ring-like  spatial  struc¬ 
tures  of  some  components  are  suggestive  of  head 
movement,  and  the  erratic  temporal  and  spatial  pat¬ 
terns  of  other  components  suggest  that  these  may 
repre.sent  noise  in  the  signal. 

If  one  or  more  of  the  ICA  components  derived  for  a 
given  data  set  are  identified  as  artifactual,  it  is  possible 
to  reconstruct  the  data  wdth  these  components  re¬ 
moved  by  zeroing  the  appropriate  row's  of  C  in 
Equation  (5).  This  potentially  allows  IC.A  to  be  used  as 
a  preprocessing  step  prior  to  further  analysis  by  any 
other  technique.  However,  movement  artifacts  cannot 
be  totally  removed  by  this  method,  as  the  changes  in  a 
voxel's  signal  activity  due  to  encroachment  of  a  neigh¬ 
boring  x'oxei  during  movement  are  a  violation  of  the 
assumption  made  by  ICA  that  the  maps  are  spatially 
stationary'.  Further  work  is  needed  to  determine  how' 
the  movement-related  components  can  be  used  to  readjust 
the  raw  data  to  eliminate  the  detected  movement  or  errors 
in  registration  for  subsequent  reanalysis. 


Figure  17. 

Task-related  components  for  one  trial  (subject  3,  trial  2),  The  consistently  task-related  (CTR) 
component  map  and  its  associated  time  course  are  shown  at  bottom.  Other  components  had  time 
courses  (yeilow  fines)  that  appeared  time-focked  to  the  CTR  for  that  trial  (white  lines)  during  part  of 
the  trial  (blue  rectangies),  and  so  could  be  called  transiently  task-related  (TTR), 
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Figure  i8. 

Robustness  to  added  noise.  The  ICA  method  separated  a  singje 
CTR  component  even  after  the  addition  of  Gaussian  noise  to  each 
voxel  time  course  at  levels  25%,  50%,  75%,  and  1 00%  of  baseline. 


Although  the  ICA  method  appears  to  be  useful  for 
flvlRI  data  analysis,  it  also  has  same  inherent  limita¬ 
tions.  First,  flViRI  signal  component  processes  may 
exhibit  saturation  or  other  nonlinear  properties  and 
thus  may  not  be  appropriate  for  analysis  using  a 
wholly  linear  model.  However,  since  the  task-related 
components  in  these  and  other  experiments  generally 
make  small  contributions  to  the  baseline  BOLD  signal, 
an  assumption  of  additivity'  may'  be  reasonable  [Boyn¬ 
ton  et  al.,  1996].  Second,  the  ICA  algorithm  assumes 
that  the  distribution  of  voxel  values  specifying  the 
map  for  each  signal  component  is  statistically  indepen¬ 
dent  of  the  distributions  of  voxel  values  specifying  all 
the  other  component  maps.  Although  this  criterion 
provides  an  essentially  unique  decomposition  of  the 
data,  it  may  not  necessarily  be  the  desired  representa¬ 
tion  for  ail  purposes.  The  spatial  independence  crite¬ 
rion,  together  with  the  particular  (here,  logistic)  nonlin¬ 
earity  used  in  the  algorithm,  biases  the  ICA  method 
towards  finding  components  having  relatively  sparse 
as  well  as  discrete  active  component  areas  [McKeowm 
et  al.,  1998].  If  some  component  process  produces 


proportional  signal  changes  over  a  relatively  large  part 
of  the  brain,  the  ICA  method  used  here  might  split  the 
effects  of  this  process  into  several  ICA  components 
with  smaller  active  areas  and  closely  related  time 
courses.  Similarly,  if  two  component  processes  contrib¬ 
ute  to  the  observed  fNlRI  signals  in  well-overlapping 
brain  areas,  ICA  may  split  the  resulting  activity'  into 
three  or  more  components,  one  component  represent¬ 
ing  the  combined  effects  of  the  two  factors  in  the 
regions  of  overlap,  and  two  others  representing  the 
regions  affected  by  just  one  of  the  two  processes.  This 
may  partly  explain  the  numerous  TTR  components 
detected  by  the  technique  [McKeown  et  al.,  1998], 
Nevertheless,  the  optimum  way  to  describe  the  vary'- 
ing  spatial  extent  of  time-dependent,  task-related  acti¬ 
vations  detected  in  ffvIRl  data  is  unclear.  Combining 
the  task-related  (CTR  and  TTR)  ICA  components  (Fig. 
17)  may  provide  a  practical,  if  somewhat  cumbersome, 
nrethod. 

One  benefit  of  the  ICA  technique  is  the  ability  to 
discern  activations  that  could  not  be  predicted  in 
advance  of  the  experiment,  e.g.,  TTR  activations.  It  is 
possible  that  TTR  components,  during  times  when 
they  are  not  time-locked  to  the  experiment,  represent 
cognitive  systems  indirectly  related  to  task  perfor¬ 
mance,  e.g.,  arousal  or  alertness.  The  ICA  approach 
s.hould  also  be  highly  promising  for  investigations  of 
patients  writh  pathological  conditions  that  may  alter 
the  latencies,  amplitudes,  and  brain  distributions  of 
their  fMRI  signals  in  unpredictable  ways. 

There  are  several  issues  about  the  ICA  decomposi¬ 
tion  of  fMRI  data  that  still  need  to  be  addressed:  1)  The 
smallest  ICA  components,  particularly  those  with 
speckled  spatial  distributions,  appear  to  be  noise  of 
unknown  origin.  As  yet,  we  do  not  know'  w'hat  propor¬ 
tion  of  a  given  component  is  physiological  signal  or 
identifiable  artifact,  and  what  is  noise.  2)  The  assump¬ 
tion  that  the  component  maps  are  spatially  stationary 
makes  the  method  sensitive  to  the  detection  of  move¬ 
ment  artifact,  but  does  not,  in  its  current  form,  allow 
for  the  straightforward  correction  of  suspected  head 
movements.  3)  Methods  for  testing  the  statistical  reli¬ 
ability  of  ICA  component  time  courses  and  areas  of 
activation  need  to  be  developed.  The  ability  of  the 
algorithm  to  converge  to  equivalent  components,  us¬ 
ing  data  from  a  subset  of  time  points  from  a  trial  (Fig. 
19),  suggests  that  "jackknife"  or  other  bootstrap  meth¬ 
ods  can  be  employed  to  determine  levels  of  statistical 
significance  for  the  voxel  values  in  a  map.  In  such 
approaches,  components  computed  from,  framing  on 
subsets  of  time  points  are  compared  to  estimate  the 
robustness  of  the  statistics  derived  from  the  complete 
data  set  (e.g.,  component  map  values  and  time  courses). 
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Figure  1 9. 

Stability  of  the  consistently  task-related  component.  Data  from  analyses  returned  one  component  with  a  square-wave  time  course 

one  trial  (subject  2,  trial  I)  were  divided  into  two  subsets,  one  closely  matching  that  of  the  CTR  component  in  the  analysis  of  the 

containing  the  odd-numbered  time  points  and  the  other  the  whole  trial  (bottom  and  right).  Map  voxel  values  for  the  square- 

even-numbered  time  points.  ICA  was  performed  separately  on  the  wave  component  in  each  subset  analysis  were  highly  correlated 

two  data  subsets  and  compared  with  results  of  ICA  decomposition  with  each  other  and  with  the  map  values  of  the  CTR  component  in 

of  the  whole  data  set  (upper  left).  Each  of  the  two  data-subset  the  whole-data  analysis  (scatter  plots  clipped  to  ”  5  <  z  <  10). 

The  ICA  algorithm  appears  to  provide  a  powerful  [Kvrong  et  al.,  1992;  Bandettini  et  al.,  1992].  ICA  is  at 

method  for  exploratory  analysis  of  fMRI  data  in  both  least  as  sensitiv-e  as  correlation  or  PCA  in  finding 

clinical  and  normal  subject  populations.  It  makes  no  task-related  activations,  and  can  isolate  potentially 

assumptions  about  the  hemodynamic  activation  func-  significant  phenomena  in  the  data  while  canceling  out 

hon,  which  may  vary  across  time  and  brain  areas  artifacts,  using  only  minimal  assumptions  about  the 
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spatiotemporal  structure  of  the  component  signals.  In 
addition,  the  ICA  method  may  allow  straightfon\urd 
analysis  of  more  complex  brain  imaging  experiments 
in  wkch  unpredictable  changes  m  cognitive  activation 
occur  in  parallel  with  changes  in  arousal  or  autonomic 
states  for  which  the  exact  time  courses  of  activation  are 
also  unknown. 
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Figure  20. 

(a)  ICA  detection  of  simulated  head  movement.  The  figure  shows 
that  the  largest  component  found  by  ICA  after  an  abrupt  head 
movement  was  simulated  halfway  through  the  time  course  of  one 
trial  by  shifting  the  spatial  structure  of  the  data  diagonally  by  one 
voxel  (4.2  mm).  The  time  course  of  this  component  dearly  mirrors 
the  time  course  of  the  simulated  movement.  The  active  voxels  of 
this  component  are  those  having  the  largest  expected  signal  change 
during  the  movement  (the  contrast  of  the  structural  MRI  has  been 
adjusted  for  clarity),  (b)  Detection  by  ICA  of  a  simulated  task 
component,  top:  A  small  simulated  task-related  signal  (left) 
consisting  of  three  square  waves  (instead  of  four^  was  either  added 
to  (red)  or  subtracted  from  (blue)  the  data  from  one  of  the  Stroop 
trials  (subject  2.  trial  I )  in  arbitrarily  selected  portions  of  two  of  the 
functional  slices,  middle:  Simulated  signal  variance  was  only  0.68/i 
of  the  mean  variance  of  the  arbitrarily  selected  active  voxels  (upper 
left).  ICA  decomposition  of  the  simulated  data  recovered  two 
components  whose  time  courses  (left)  resembled  three  square 
waves  and  four  square  waves,  respectively.  Maps  of  active  voxels 
(  z  >  2)  for  the  three-square-wave  component  accurately  identi¬ 
fied  the  locations  and  polarities  of  the  simulated  active  areas 
(right),  with  only  two  false-positive  outlying  voxels  (slice  two, 
bottom  left).  Correlating  the  simulated  data  with  a  three-square- 
wave  reference  function  and  using  a  standard  correlation  threshold 
(ir  >  0,4)  detected  only  10.7%  of  the  simulated  active  voxels, 
bottom:  When  the  correlation  threshold  was  reduced  until  the 
number  of  active  voxels  found  by  both  methods  was  the  same 
(n  =  195.  ;r;>  0.197),  only  67  (34,4%)  of  the  active  voxels 
selected  by  correlation  were  in  the  simulated  active  areas,  while 
1 28  (6S.6?4)  were  false  positives. 
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The  public  reporting  burden  for  this  coilection  of  informafion  is  estimafed  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data 
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Current  analytical  techniques  applied  to  functional  magnetic  resonance  imaging  (fMRI)  data  require  a  priori 
knowledge  or  specific  assumptions  about  the  time  courses  of  processes  contributing  to  the  measured  brain  electrical 
signals.  Here  we  describe  a  new  method  for  analyzing  fMRI  data  based  on  the  independent  component  analysis 
(ICA)  algorithm  of  Bell  and  Sejnowski  ([1995]^  Neural  Comput  7H129-1159).  We  decomposed  eight  fMRI  data  sets 
from  4  normal  subjects  performing  various  cognitive  tasks,  by  utilizing  higher-order  statistics  to  enforce 
successively  stricter  criteria  for  spatial  independence  between  component  maps,  both  the  ICA  algorithm  ad  a 
related  fourth-order  decomposition  technique  were  superior  to  principal  component  analysis  (PCA)  in  determining 
the  spatial  and  temporal  extent  of  task-related  activation.  ICA  appears  to  be  a  highly  promising  method  for  he 
analysis  of  fMRI  data  from  normal  and  clinical  populations. 
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